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Summary. We consider the problem of detecting the periodic part of a function given 
the observations of some input/output tuples (xi,yi), 1 < % < n. As they are known for 
being powerful tools for dealing with such data, our approach is based on Gaussian process 
regression models which are closely related to reproducing kernel Hilbert spaces (RKHS). 
The latter offer a powerful framework for decomposing covariance functions as the sum of 
periodic and aperiodic kernels. This decomposition allows for the creation of sub-models 
which capture the periodic nature of the signal and its complement. To quantify the pe- 
riodicity of the signal, we derive a periodicity ratio which reflects the uncertainty in the 
fitted sub-models. Although the method can be applied to many kernels, we give a spe- 
cial emphasis to the Matern family, from the expression of the RKHS inner product to 
the implementation of the associated periodic kernels in a Gaussian process toolkit. The 
efficiency of the proposed method is finally illustrated on a biological case study where we 
detect periodically expressed genes. 

Keywords. Harmonic analysis, RKHS, Kriging, Matern kernels. 



1 Introduction 

The periodic behaviour of natural phenomena arises at many scales, from the small wave- 
length of electromagnetic radiations to the movements of planets. The mathematical study 
of natural cycles can be traced back to the XIX century with Thompson's harmonic analy- 
sis for predicting tides [Thomson, 1878] and Schuster's investigations on the periodicity of 
sunspots [Schuster, 1898]. Amongst the methods that have been considered for detecting 
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and extracting the periodic trend, one can cite harmonic analysis [Hartley, 1949], folding 
methods [Stellingwerf, 1978, Leahy et al., 1983] which are mostly used in astrophysics and 
periodic autoregressive models [Troutman, 1979, Vecchia, 1985]. In this article, we will 
focus on the application of harmonic analysis in reproducing kernel Hilbert spaces (RKHS) 
and on the consequences for Gaussian Process (GP) modelling. 

Harmonic analysis is based on the projection of a function on a basis of periodic functions. 
For example, a natural method for extracting the 27r-periodic trend of a function / is to 
decompose it in a Fourier series: 

f(x) — > f p (x) = ai sin(x) + a 2 cos(:r) + 03 sin(2x) + a 4 cos(2:r) + . . . (1) 

where the coefficients are given, up to a normalising constant, by the L 2 inner product 
between / and the elements of the basis. However, the phenomenon under study is often 
observed in a limited number of points, which means that the value of f(x) is not known 
for all x but only for a small set of inputs {x±, . . . ,x n } called the observation points. With 
this limited knowledge of /, it is not possible to compute the integrals of the L 2 inner 
product so the coefficients a* cannot be obtained directly. 

A popular approach to overcome the fact that / is partially known is to build a math- 
ematical model m to approximate it. A good model m has to take into account as much 
information as possible about /. Typically, it interpolates / for the set of observation points 
m(xi) = f(xi) and its differentiability corresponds to the assumptions one can have about 
the regularity of /. The main body of literature tackling the issue of interpolating spatial 
data is scattered over three fields: (geo-)statistics [Matheron, 1963, Stein, 1999], functional 
analysis [Aronszajn, 1950, Berlinet and Thomas-Agnan, 2004] and machine learning [Ras- 
mussen and Williams, 2006]. In the first case, the solution of the interpolation corresponds 
to the conditional expectation of a Gaussian process Z and in the second, it is the in- 
terpolator with minimal norm in a particular Hilbert space T-L. As many authors pointed 
out (see for example Berlinet and Thomas-Agnan [2004], Scheuerer et al. [2011]), the two 
approaches are closely related. Both Z and T-L are based on a common object which is 
a positive definite function of two variables k(., .). In statistics, k corresponds to the co- 
variance of Z and for the functional counterpart, k is the reproducing kernel of T-L. From 
the interpolation or regularization point of view, the two approaches are equivalent since 
they lead to the same model m [Wahba, 1990]. Although we will focus hereafter on the 
RKHS framework to design periodic kernels, we will also take advantage of the powerful 
probabilistic interpretation offered by Gaussian processes. 
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A naive approach for extracting the periodic part of / given some observations would be 
to approximate it with a mathematical model m and to compute the Fourier coefficients 
of m. However, this method is not fully satisfactory since each step involves an orthogonal 
projection for a different norm. In other words, the construction of m and the computation 
of the coefficients are optimal, but not for the same criterion. As a result the periodic part 
obtained with this method cannot naturally be seen as a "best predictor" . To overcome this 
issue, we propose in this article to build the Fourier series using the RKHS inner product 
instead of the L 2 one. To do so, we extract the sub-RKHS Hp of periodic functions in % 
and model the periodic part of / by its orthogonal projection onto T-L p . The prediction 
then inherits from the probabilistic framework associated with RKHS and the percentage 
of periodicity of / can elegantly be estimated. 

The last part of this introduction, gives an overview of the RKHS framework and em- 
phasises the properties of the Matern family of kernels. In section 2, we focus on the 
construction of periodic kernels. Section 3 details the decomposition of GP models into 
periodic and aperiodic sub-model. These results allow us to introduce, in Section 4, a new 
criterion for measuring the periodicity of a signal. Finally, the last section illustrates the 
proposed approach on a biological case study where we detect, amongst the entire genome, 
the genes presenting a cyclic expression. This issue of detecting periodically expressed 
genes is the application that initially motivated the present work. 

The examples and the results presented in this article have been generated with the 
version 0.2 of the python Gaussian process toolbox GPy. This toolbox, in which we have 
implemented the periodic kernels discussed here, can be downloaded at http://github. 
com/Sheff ieldML/GPy. 

1.1 Approximation in Reproducing Kernel Hilbert Spaces 

The aim of this section is to introduce the notion of RKHS and to derive the expression 
of the best predictor. We will also briefly show how to construct a RKHS from any positive 
definite function. For a more details, we refer the reader to Berlinet and Thomas-Agnan 
[2004, chap. 1] and Aronszajn [1950]. 

Let H be a Hilbert space of real valued functions defined over DcR. H is said to be a 
RKHS if and only if there exist a function i(.,.):DxD-fR such that for all x e D 

(i) k(x,.) e U 
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(ii) V/Gft, f(x) = (f,k(x,.)) n . 

The function k satisfying these properties is unique and it is called the reproducing kernel 
of ft. 

Recalling that a function k is said to be positive semi-definite if Vm G IN, Va G K"\ Vx G 

m m 

^ didjkixi, Xj) > 0, (2) 
i=i i=i 

it can be shown that a reproducing kernel is necessarily a symmetric positive semi-definite 
(spd) function. Reciprocally, the Moore- Aronszajn theorem states that for all spd-function 
k on D x D, there exist only one RKHS of functions on D with k as reproducing kernel. 
A common approach is then to define a RKHS by specifying its reproducing kernel. As 
the covariance of a random process is also a spd-function, we will use interchangeably the 
words kernel, covariance function and reproducing kernel. 

To get an insight on the elements of the RKHS associated with a spd-function k, we first 
consider the space ft generated by finite combinations of k(x iy .): 

ft = ^y]aik(xi, .), cii G R, Xi G D, m G IN J . (3) 

Obviously, we have k(x, .) G H for all x G D so (i) is satisfied. Using the property that k 
is a spd-function, it is straightforward to show that 

' m to' \ to m' 

^2a i k{x i ,.),^2b j k{x j ,.)\ =^2^2a i b j k(x i ,x j ). (4) 
i=i j=i / H i=i j=i 

defines a valid inner product on _ft. One particular asset of this inner product is that k(x, .) 
satisfies (ii). Indeed, for all / = ^aj/c(xj, .) G ft and x G D we have 

TO TO 

(/, fc(a:, -)) H = a i( k ( x i' •)» = 5^ a ^( x *' x ) = ( 5 ) 

i=i i=i 

Although the properties (i) and (n) are fulfilled, ft is not a necessarily a RKHS since it 
may not a Hilbert space (it is not always complete). Let ft be the closure of ft and (., .) n 
the continuous extension of (.,.)# onto ft. Then ft is a Hilbert space and it can be shown 
that (i) and (ii) are still satisfied: ft is the only RKHS with kernel k. 
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We will now focus on how to take advantage of the RKHS framework to approximate a 
function / that is observed in a limited number of points. Let X = {x±, . . . ,x n } G D n be 
a set of points where the value y^ = f(xi) is known and y be the vector of y^. For a given 
RKHS H, the best interpolator m is defined as the interpolator with minimal norm: 

m = argmin (\\h\\ n I h(xi) = y u i e 1, . . . , n). (6) 
hen 

It can be shown that m corresponds to the orthogonal projection of / onto the space 
spanned by the k(xi, .): 

H x = span (k(xi, .), x^eX). (7) 

Let k(.) be the n x 1 vector of functions with general term (k(.))j = k(xi, .). This vector 
corresponds to a basis of Hx- The Gram matrix K associated to this basis has general 
term = (k(xi, .), k(xj, .)) n = k(xi,Xj). When K is invertible, it is straightforward to 
show that 

k x (x,y)=k T (x)K- 1 k(y) (8) 

satisfies (i) and (ii). Since Hx is a finite dimensional space it is necessarily complete so 
Hx is a RKHS with reproducing kernel kx- The orthogonal projection of / onto Hx is 
then: 

m(x) = (k x (x, .), f) n = k T (x)K- 1 (k(.), f) n = k r (x)K- 1 y. (9) 

In the geostatistical community, m is referred to as the Kriging mean. In the probabilistic 
framework, this expression corresponds to the conditional expectation of a centred Gaussian 
process Z with covariance k knowing the observations. Furthermore, GP provide naturally 
some prediction variance for the model: 

m(x) = E[Z(x)\Z( Xi ) = Vl ] = k T (a:)K- 1 y 
v(x) = Vax[Z(x)\Z(xi) = yi] = k(x,x) — k T (x)'K~ 1 'k(x) 

One particular asset of Eqs. 9-10 is that the expressions of m, v only depends on k. As 
a result it is not necessary to derive the expression of the inner product generated by k 
to obtain the best predictor and any spd-function can be used directly to build models. 
However, a direct proof of the positive definiteness of a function is often intractable and a 
widespread approach is to use well known spd-functions such as the squared-exponential 
(i.e. Gaussian and radial basis function) or the spline kernel. The next section recalls some 
results about another interesting class of spd-functions: the Matern family. 
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1.2 The Matern class of kernels 

Matern kernels k are stationary spd-functions, which means that they only depend on 
the distance between the points they are evaluated at: k(x, y) = k(\x — y\)- They are often 
introduced by the spectral density of k [Stein, 1999]: 



(. 




Three parameters can be identified in this equation: v which tunes the differentiability of k, 
9 which corresponds to a lengthscale parameter and a 2 that is homogeneous to a variance. 
Note that all these parameters are positive reals. 

The actual expressions of the Matern kernels are simple when the parameter v is half- 
integer. For v = 1/2, 3/2, 5/2 we have 



f - y\ 

9 



k 1/2 (x,y) = cr 2 exp 

u < \ 2 ( , , V3\x-y\\ 

h/2\X, y)=<T 11 + I 



V3\x — y\ 

exp I 9 J (12) 



., , y/Elx — y\ 5b — y\ 2 \ ( y/E\x — y\ 
h/2\x, y) = a ( 1 + + — — — ) exp 



9 39 2 J I 9 

It can be seen that the parameters 9 and a 2 respectively correspond to a rescaling of 
the abscissa and ordinate axis. For v — 1/2 one can recognise the expression of the 
exponential kernel (i.e. the covariance of the Ornstein-Uhlenbeck process) and the limit 
case v — > oo corresponds to the squared exponential covariance function [Rasmussen and 
Williams, 2006]. 

One considerable asset of the Matern class of kernels is to have strong connections with 
various fields. For example, a Gaussian process Z with Matern covariance is an autore- 
gressive process. As detailed in appendix A, this connection allows to use previous results 
from the literature to derive the expression of the inner products of the associated RKHS: 

Matern 1/2 (exponential kernel) 

<<?, % 1/2 = 2^ f a + 9') (jjh + ft') dt + ±g(a)h(a) (13) 
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Matern 3/2 



M ^ = W^L {(p g+2 ir g+g ){ 



^h + 2^h' + h"\dt 



2 



1 9 

+ ^g(a)h(a) + ^g'(a)h'(a) 



(14) 



a 



Matern 5/2 



(^ 5/2 = y Lt^LtWdt + — 2 g(a)h(a) + ^^ g ( a )"h''(a) 



(15) 



30 2 /ll 
+ ^ igf(a)h'(a) + -g"(a)h(a) + -g(a)h"(i 

Although these expressions are direct consequences of Doob [1953] and Hajek [1962] they 
cannot be found in the literature to the best of our knowledge. 



Another field that is closely related to Matern kernels is Sobolev spaces. As stated 
in Porcu and Stein [2012, Theorem 9.1] and Wendland [2005], the RKHS generated by 
k coincides with the Sobolev space W^ 1 ^ 2 . This will be particularly useful in the next 
section to show that sine and cosine functions belong to the RKHS. 

Scheuerer et al. [2011] point out that Sobolev spaces are intuitively more accessible than 
RKHS (it is often straightforward to tell if a function belongs or not to W2) but RKHS 
offer a good framework for deriving an approximation of / based on the observations f(xi). 
As a consequence, Matern RKHS are very interesting for modelling since they benefit from 
both assets: the Sobolev structure of H allows to understand the assumptions on / (for 
example, v is directly linked to differentiability of /) and the RKHS properties give a 
compact expression for the optimal predictor. 



2 Kernels of periodic subspaces 

2.1 Fourier basis in RKHS 

In this section, we will see how to extract the subspace of 27r-periodic functions in a RKHS 
%. We will assume here that H has a Matern kernel where v is half- integer. However, the 
method presented here can be applied to any RKHS as long as the Gram matrix associated 
to a periodic basis can be computed. For a detailed list of RKHS inner products we refer 
the reader to [Berlinet and Thomas- Agnan, 2004, Chap. 7]. 
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One popular basis for a space of periodic functions is the Fourier basis (sin(a;), cos(x), 
sin(2:r), cos(2x), . . . ). Hereafter, we consider a truncated version of this basis, ignoring the 
frequencies higher than q 

F(x) = (sin(a;), cos(rc), . . . , sin(gx), cos(gx)) T , (16) 

and we denote by % p the space spanned by this basis. The fact that "H coincides with 
W2 +1 ^ 2 ensures that the elements of H are the functions such that 

• the i th derivatives (0 < i < v — 1/2) are absolutely continuous and square integrable, 

• the (v + l/2) th derivative is defined almost everywhere and is square integrable. 

As a consequence, we have "H p C "H since all the functions of the basis are infinitely 
differentiable. Let G be the Gram matrix of F in "H: Gjj = (Fj,Fj)^. Similarly to Eq. 8, 
it is straightforward to show that 

k p (x,y) = F T (x)G- 1 F(y) (17) 

is the reproducing kernel of H p . Hereafter, we will refer to k p as the periodic kernel. 



The matrix G can be computed from the expression of the inner product given in Eqs. 13- 
15. In contrast to the Gram matrix of the Fourier basis in L 2 , G is not a diagonal matrix if 
the length of D is a multiple of the period. One essential property for the practical use of 
periodic kernels is that the computation of the elements of G can be performed analytically. 
Indeed, all the elements of the basis can be written in the form cos(cux + ip). Using the 
notation L x for the linear operators in the inner product integrals (see Eq. 15) we obtain: 

L x (cos(ux + </?)) = ai cos(ux + (p)^ = ctiUJ 1 cos ( ux + tp + — j . (18) 

i i ^ ' 

The latter can be factorised in a single cosine pcos(ux + 0) with 

/ 2 , 2 , / arcsin (r s /p) if r c > 

P=Vr 2 c +r 2 s , <P= I . (19) 

] arcsm (r s / p) + ix it r c < 

where r c = c^u 1 cos ( <p + — j and r s = aiU 1 sin ( p + — J . 

i ^ ' i ^ ' 

Eventually, the computation of the inner product boils down to the integration of a 
product of two cosines, which can be solved by linearisation. 
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2.2 Tuning the period 

We assumed previously a 27r-periodicity for the signal. However this period can be 
modified by introducing a parameter A in the definition of the Fourier basis: 



F A (x) = 




As for the other parameters of the kernel a 2 and 9, maximum likelihood estimation can be 
used to obtain a value of A well suited to the data. This estimation of the period will be 
illustrated in the case study of the next section. 



2.3 Application to a benchmark 

We will now illustrate on a benchmark of test functions the use periodic kernels for GP 
modelling. Furthermore, we will compare the resulting models with COSOPT [Straume, 
2004] and ARSER [Yang and Su, 2010] which are representative of the methods commonly 
used in biostatistics for detecting periodically expressed genes [Hughes et al., 2009, Amaral 
and Johnston, 2012]. 



COSOPT assumes the following model for the signal: 

y(t) = a + /3t + 7cos(wt + <p) +e, (21) 

where e corresponds to some white noise. The algorithm proceeds in two steps to determine 
the values of a, (3, 7, u and (p. First, a linear regression model is fitted to estimate the value 
of a, (3. The linear trend is then subtracted from the signal and the remaining parameters 
are fitted by minimizing the mean square error. 



The underlying model is more sophisticated for ARSER since it accounts for various 
frequencies: 

y(t) = a + f3t + J2li cos(oOit + (p t ) + e. (22) 

i 

The number of cosine terms and their frequencies Ui are obtained by detecting the peaks of 
the spectrum via the fast Fourier transform. In practice, this number is typically around 
1-5. As previously the linear trend is initially subtracted to the data and an additional 
smoothing is performed to limit high frequencies due to noise. Although this model is more 
flexible, it has two drawbacks: the input points are assumed to be regularly spaced and 
the resulting model is not necessarily periodic. 
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In addition, we introduce the following Gaussian process model: 

Y(t) = a + pt + Y p (t) +e. (23) 

where Y p has periodic kernel k p . Here, a and (3 should be interpreted as random variables 
with Gaussian distribution jV(0, 1). The best predictor associated with this model given 
by Eq. 9) where k(x,y) = 1 + xy + a p k p (x,y) + r 2 5(x,y). The parameters a p , 9, A and 
t 2 are obtained by maximising the likelihood of the observations. This is equivalent to 
minimizing -2 times the log-likelihood: 

£ = nlog(27r)+log|K|+Y T K- 1 Y (24) 

which depends on all the parameters of the kernel through the matrix K. The number of 
frequencies in the Fourier basis is set to q — 20. The best predictor associated with this 
model given by Eq. 9 where k(x, y) = 1 + xy + k p (x, y) + r 2 5 Xty . Although this information 
is readily available, we will not use in this benchmark the prediction variance provided by 
the GP models. 

COSOPT and ARSER also include additional features for measuring the periodicity 
of a signal in term of p-value or false discovery rate. The probabilistic framework of 
Gaussian processes could be used to derive such statistics for the proposed model but these 
developments fall out of the scope of the present article. 

The prediction of these models are compared on a benchmark of 1-periodic test functions 
defined over [0,3]: cos(27rt), sumcos(t) = l/2(cos(27rf) + cos(4-7rt)), square(t), triangle(t), 
diag(t) and noise (t) which are represented in Figure 1. A training set of 50 equally spaced 
test points is used for learning these functions and a A/"(0,0.1) observation noise is added 
to each observation, except for noise where the perturbations are Af(0, 1). As the test 
functions do not include any linear trend, the value of j3 is fixed to zero for all models. 

The models fitted with COSOPT, ARSER and the periodic GP can be compared in 
Figure 2. To asses the overall precision, we repeat the fitting procedure 50 times with 
different values of the observation noise. The root mean square error (RMSE) is computed 
based on a 500-point test set spanning [0, 3]. A summary of the obtained result is given in 
Table 1. 

Many remarks can be formulated based on the observation of Figure 2 and Table 1. First, 
COSOPT gives a good fit for the cosine function, but also for the triangular test function. 
This can be explained by the overall cosine shape of the latter. The noise filtering can be 
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Figure 1: Test functions considered in the benchmark. The crosses indicate the observed 
values after adding the random noise. 




Figure 2: Plots of the test functions with associated fitted models. For an improved 
visibility, the plotting region is limited to one period. The periodic GP model is based on 
a periodic Matern kernel with regularity v = 3/2. 
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test function 


COSOPT 


ARSER 


GP v = 1/2 


GP v = 3/2 


GP v = 5/2 


cos 


0.09 (0.03) 


0.23 (0.03) 


0.16 (0.03) 


0.11 (0.03) 


0.10 (0.03) 


sumcos 


0.36 (0.01) 


0.24 (0.08) 


0.19 (0.09) 


0.14 (0.04) 


0.13 (0.04) 


square 


0.60 (0.01) 


0.37 (0.03) 


0.31 (0.05) 


0.32 (0.04) 


0.32 (0.03) 


triangle 


0.11 (0.02) 


0.23 (0.03) 


0.15 (0.03) 


0.12 (0.03) 


0.12 (0.03) 


diag 


0.36 (0.01) 


0.33 (0.04) 


0.26 (0.04) 


0.26 (0.03) 


0.26 (0.03) 


noise 


0.40 (0.06) 


0.73 (0.11) 


0.44 (0.20) 


0.39 (0.19) 


0.37 (0.21) 


mean 


0.32 


0.36 


0.25 


0.22 


0.22 



Table 1: Mean value (and standard deviation) of RMSE for each test function and model. 
The best fit is indicated in italic. The models within one standard deviation from the best 
result are indicated in bold. 



judged satisfactory for this model but, as expected, the model fails to approximate non- 
sinusoidal patterns such as square and sumcos. The wider range of frequencies allowed in 
ARSER makes it capable of approximating these more complicated patterns. The drawback 
for this model is its sensitivity to noise. Indeed high frequencies oscillations corresponding 
to noise overfitting can be observed on ARSER models. Although some functions in the test 
set are typically difficult to approximate with Gaussian process models due the presence 
of discontinuity, the models based on periodic kernels perform remarkably well on this 
benchmark. On the one hand, the large number of frequencies considered in the truncated 
Fourier basis allows a good fit of non-sinusoidal patterns. On the other, the embedding of 
this basis into a Matern RKHS naturally imposes a penalty on the high frequencies which 
results in a good filtering of the noise. 

Note that the results obtained for the periodic GP models are not specific to the class 
of periodic kernel introduced in this article. Usual periodic kernels such as k(x,y) = 
exp (— (sin(|x — y\) 2 ) (see Rasmussen and Williams [2006] for more details) would probably 
lead to similar results. We will detail in the next section one particular asset of the proposed 
kernels in term of decomposition of the signal. 

3 Decomposition of models 

3.1 Decomposition of kernels 

The difference of two kernels is generally not a valid covariance function. However the 
construction of k p ensures that, in this particular case, k a = k — k p corresponds to a kernel. 
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(a) Matern kernel k. (b) periodic sub-kernel k p . (c) aperiodic sub-kernel k a . 

Figure 3: Examples of decompositions of a Matern 3/2 kernel as a sum of a periodic and 
aperiodic sub-kernels. The three graphs on each plot correspond to a different value of 
the lengthscale parameter 9. For this example the input space is D = [0, 4tt], the cut-off 
frequency is q = 20 and one of the variables of the kernels is arbitrarily fixed to 5. 



This is straightforward to see using the RKHS framework since k a is the reproducing kernel 
of the orthogonal complement of l-i p in % [Berlinet and Thomas- Agnan, 2004]. As this 
space is orthogonal to the (truncated) Fourier basis, it will be referred to as the subspace 
of aperiodic functions (hence the subscript a). From the probabilistic point of view, this 
decomposition corresponds to the decomposition of Z as a sum of two independent Gaussian 
processes, with covariance functions k p and k a . This can be summarized as follow: 

k = k p + k a , H = H P + H a , Z = Z p + Z a . (25) 
An illustration of the decomposition of Matern 3/2 kernels can be found in Figure 3. 



3.2 Periodic and aperiodic sub-models 

The expressions of Eq. 25 allow decomposition of the best predictor as a sum of two 
sub-models m p and m p : 

m(t) = E[Z p (t) + Z a {t)\Z{ti) = Vi ] 

= E[Z p (t)\Z(t t ) = y t ] + E[Z a (t)\Z(ti) = yi ) (26) 
= k p (t) T K- 1 y + k a (t) T K- 1 y- 
Similarly, prediction variances are associated to the sub-models 

v p (t) = Vsx[Z 9 (t)\Z{U) = Vi ] = k p (t,t) - k p (t) T K- 1 k p (t) 
v a (t) = Vai[Z a (t)\Z(U) = yi ] = k a (t,t) - k a (t) T K- l k a (t). 
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However, contrarily to Eq. 26, we have v(t) ^ v p (t) +v a (t) since Y p and Y a are not indepen- 
dent knowing the observations. For a detailed discussion on the decomposition of models 
based on a sum of kernels see Durrande et al. [2012]. 

The sub-models can be interpreted as usual GP models with correlated noise. For ex- 
ample, trip is the best predictor based on kernel k p with an observational noise given by 
K a . For the RKHS framework, m p and m a correspond to the solution of a regularization 
problem and they respectively belong to % p and T-L a . 

We now illustrate this model decomposition on the Mauna Loa Observatory dataset 
[Keeling et al., 2009] which is frequently used in modelling [Rasmussen and Williams, 2006, 
Wilson and Adams, 2013]. This dataset contains the monthly average of C0 2 concentration 
in the atmosphere since 1958, expressed in micromol of C0 2 per mol of dry air. Hereafter 
we will focus on the first six years of the time series, using the initial 48 time points as 
training data and predicting for the following 24 months. For this dataset we will assume 
that the one-year period is known. 

We first consider a GP regression model based on a regular Matern 3/2 kernel, with 
maximum likelihood estimation of a 2 , 9 and r 2 . Figure 4 represents the decomposition of 
the model as detailed in Eqs. 26-27. It can be seen that the periodic sub-model successfully 
extracts the periodic component. Although this model gives very accurate predictions in 
the training region it drastically fails to forecast the behaviour of the signal after the last 
observation. We will now see how to improve this result using the sub-kernels. 

3.3 Parametrisation of the kernel 

A Matern kernel k initially depends on three parameters: the regularity v, its variance 
a 2 and its lengthscale 6. However, the decomposition k = k p + k a allows us to set the values 
of those parameters separately for each sub-kernel in order to increase the flexibility of the 
model. The new set of parameters of k is then (z/ p , a 2 , 9 p , u a , a 2 , 6 a ), to which A may be 
added if the period is unknown. 

After reparametrisation, k belongs to a larger family of kernels that encapsulates the 
Matern one. Furthermore, if v p = u a , and a 2 , a 2 ^ the RKHS generated by k and the 
one associated with a Matern kernel with equal regularity correspond to the same space, 
but endowed with a different norm. 
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(a) global model m. (b) periodic sub-model m p . (c) aperiodic sub-model m a . 

Figure 4: Decomposition of a model based on the Mauna Loa Observatory dataset. The 
model is trained on the 48 data-points contained in the left part of the graph. The kernel 
is Matern 3/2, and the cut-off parameter of the Fourier basis is set to q — 20. The shaded 
area corresponds to 95% confidence intervals and the test function is represented in red. 
The small increase of the confidence interval width in the left of panel a is due to missing 
data, which is naturally supported by GP models. 

The graphs presented in Figure 5 show the obtained model after estimating (o"p, 9 p , a^, 9 a ), 
the regularities (u p , v a ) being fixed to 3/2. In this example, adding two parameters dras- 
tically improves the fit of the test function outside the observation region. The global 
behaviour of the phenomenon is successfully captured by the model which is capable of 
reproducing both the small scale patterns (oscillations) and the large scale trend. One 
limitation here is that the regularity parameter of the periodic and aperiodic sub-models 
is assumed to be the same whereas observation of data suggests a smaller differentiability 
order for the periodic part. 

4 Measuring the periodicity 

The decomposition of the model into a sum of sub-models can be used for estimating 
a ratio of periodicity of the signal. In sensitivity analysis, a common approach for mea- 
suring the effect of a set of variables (x\, . . . ,x n ) on the output of a multivariate function 
f(x\, . . . ,x n ) is to introduce a random vector X = (Xi, . . . ,X n ) with values in the input 
space of / and to define the variance explained by one subset of variables x/ = (xj 1 , . . . , Xi m ) 
as Vi = Var (E (/(X)|X/)) [Oakley and O'Hagan, 2004]. Furthermore, the probabilistic fea- 
tures of the GP model can be taken into account by computing the indices based on random 
paths of the conditional GP [Marrel et al., 2009]. 
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Figure 5: Model and sub-models after parametrisation of the kernel by (a p , 9 P , o 2 a , 6 a 
The test points and the other settings are the same as in Figure 4. 



We now apply these two principles to define a periodicity ratio based on the sub-models. 
Let T be a random variable defined over the input space and Z P1 Z a be the periodic and 
aperiodic components of the conditional GP Z knowing it interpolates the data-points. Z p 
and Z a are normally distributed with respective mean and variance (m p , v p ), (m a , v a ) 
and their covariance is given by Cov (Z p (t), Z a (t')) = — k p (t) T K _1 k a (t / ). To quantify the 
periodicity of the signal we introduce the following periodicity ratio: 



R 



Var T [Z p (T)] 



Var T [Z p (T) + Z a (T)] 



(2* 



Note that R does not correspond to the percentage of periodicity of the signal in a rig- 
orous way since the dependence between Z p and Z a implies Var T [Z'(T)] ^ Var T [Z p (T)] + 
Var T [Z a (T)]. 



5 Application to gene expression studies 

The 24 hour cycle of days can be observed in the oscillations of biological mechanisms 
at many scales. This phenomenon, called circadian rhythm, can for example be seen at 
a microscopic level on gene expressions. The cellular mechanism ensuring this periodic 
behaviour is called the circadian clock. For Arabidopsis, which is a widely used organism in 
plant biology and genetics, the study of the circadian clock at a gene level shows an auto- 
regulatory system involving several genes [Ding et al., 2007]. As advocated in Edwards 
et al. [2006], it is believed that the genes involved in the oscillatory mechanism have a 
cyclic expression so the detection of periodically expressed genes is of great interest for 
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completing current models. As stated in the introduction, this application is the one that 
motivated the work presented in this article. 

The mechanism allowing genes to interfere in the functioning of the cell can be sum- 
marised as follows: DNA is first duplicated into messenger RNA, and this RNA is then 
used for protein synthesis. To quantify the expression of a specific gene it is thus possible 
to measure the concentration of RNA molecules associated with this gene. Microarray 
analysis and RNA-sequencing are two examples of methods that take advantage of this 
principle. 

The dataset we consider here has been initially studied by Edwards et al. [2006] 1 . It 
corresponds to gene expression for nine day old arabidopsis seedlings. After eight days 
under a 12h-light/12h-dark cycles, the seedlings are transferred into constant light. A 
microarray analysis is performed every four hours, from 26 to 74 hours after the last 
dark-light transition, to monitor the expression of 22810 genes. Edwards et al. [2006] 
use COSOPT [Straume, 2004] for detecting periodicity genes and identify a subset of 3504 
periodically expressed genes, with an estimated a period between 20 and 28 hours. 

We now apply to this dataset the method described in the previous sections. The kernel 
we consider is a sum of a periodic and aperiodic Matern 3/2 kernel plus a delta function 
to reflect observation noise: 

fc(M') = a 2 p k p (t,t') + a 2 a k a (t,t')+T 2 5(t,t'). (29) 

Although the cycle of the circadian clock is known to be around 24 hours, circadian rhythms 
often depart from this figure (indeed circadian is Latin for around a day) so we introduce 
a parameter A as in Sec. 2.2 to estimate the actual period. The final parametrisation 
of k is based on six variables: (a 2 ,9 p ,al,9 a ,r 2 ,\). For each gene, the values of these 
parameters are estimated using maximum likelihood. The optimization is based on the 
standard options of the GPy toolkit with the following boundary limits for the parameters: 
a p , o a > 0; 6 P , 9 a G [10, 60]; r 2 G [10~ 5 ,0.75] and A G [20, 28]. Furthermore 50 random 
restarts are performed for each optimization to limit the effects of local minimums. 

Eventually, the periodicity of each model is assessed with the ratio R given by Eq. 28. 
As this ratio is a random variable, we approximate the expectation of R with the mean 
value of 1000 realisations. To obtain results comparable with the original paper on this 

1 The original dataset is available online at http://millar.bio.ed.ac.uk/data.htm. 
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Vcosopt 



Vcosopt 



2127 1377 



Vgp Vgp 



1377 17929 
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COSOPT 



Table 2: Confusion table as- 



Figure 6: Estimated periods for the genes in 
Vgp H V cosopt- The coefficient of determi- 
nation of x — > x (dashed line) is 0.69. 



sociated to the predictions 
by COSOPT and the pro- 



posed GP approach. 

dataset, we label as periodic the set of 3504 genes with the highest periodicity ratio. The 
cut-off periodicity ratio associated with this quantile is 0.77. 

Let Vcosopt and Vgp be the sets of selected periodic genes respectively by Edwards et al. 
[2006] and the method presented here. The overlap between the two sets is summarised 
in Table 2 where S denotes the complement of a subset S. Although the results cannot 
be compare to any ground truth, the methods seem coherent since 88% of the genes share 
the same label. Furthermore the estimated value of the period A is consistent for the genes 
labelled as periodic by the two methods, as seen in Figure 6. 

One interesting comparison between the two methods is to examine the genes that are 
classified differently. The available data from Edwards et al. [2006] allows focusing on 
the worst classification mistakes made by one method according to the other. This is 
illustrated in Figure 7 which shows the behaviour of the most periodically expressed genes 
in Vgp according to COSOPT and, conversely, the genes in Vcosopt with the highest 
periodicity ratio R. Although it is undeniable that the genes selected only by COSOPT 
(panel a) present some periodic component, they also show a strong non-periodic part, 
corresponding either to noise or trend. For these genes, the value of the periodicity ratio is: 
0.74 (0.10), 0.74 (0.15), 0.63 (0.11), 0.67 (0.05) (means and standard deviations, clockwise 
from top left) which is close to the classification boundary. On the other hand, the models 
suggested only by the GP approach show a strong periodic signal (we have for all genes 
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Figure 7: Examples of genes with different labels. The selected genes correspond to the 
four genes with the highest periodic part according to the method that label the gene as 
periodic. The titles of the graphs correspond to the name of the genes (AGI convention). 
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R = 1.01 (0.01)) with sharp spikes. Another interesting fact of panel b is that there is at 
least one observation associated with each spike which suggests that the behaviour of the 
model should not be interpreted as overfitting. 

This few elements of comparison on a real life case study show some very promising 
results, both for the capability of the proposed method to handle large datasets and for the 
quality of the results. Furthermore we believe that the spike shape of the newly discovered 
genes may be of particular interest for understanding the mechanism of the circadian clock. 
The full results, as well as the original dataset can be found in the supplementary materials. 

6 Conclusion 

The main purpose of this article is to introduce a new approach for estimating and 
extracting the periodic part of a function / given some observations f(xj) = y^. As often, 
the proposed method corresponds to the orthogonal projection onto a basis of periodic 
functions. The originality here is to perform this projection in some RKHS where the partial 
knowledge given by the observations can be dealt with elegantly. Previous theoretical 
results from the mid- 1900s allowed us to derive the expressions of the inner product of 
RKHS based on Matern kernels. Given these results, it was then possible to define a 
periodic kernel k p and to decompose fcasa sum of sub- kernels k = k p + k a . 

We illustrated three fundamental feature of the proposed kernels for GP modelling. First, 
as we have seen on the benchmark examples, they allow to approximate non-sinusoidal 
patterns while retaining appropriate filtering of the noise. Second, they provide a natural 
decomposition of the GP model as a sum of periodic and aperiodic sub-models. Third, 
they can be reparametrised to define a wider family of kernel which is of particular interest 
for decoupling the assumptions on the behaviour of the periodic and aperiodic part of the 
signal. This approach has proved to increase considerably the prediction ability of the 
model on the Mauna Loa Observatory dataset. 

The probabilistic interpretation of the decomposition in sub-models is of great impor- 
tance when it comes to define a criterion that quantifies the periodicity of / while taking 
into account the uncertainty about it. This goal was achieved by applying methods com- 
monly used in GP based sensitivity analysis to define a periodicity ratio. 

Although the proposed method can be applied to any time series data, this work has 
originally been motivated by the detection of periodically expressed genes. In practice 
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listing such genes is a key step for a better understanding of the circadian clock mechanism 
at a microscopic level. The effectiveness of the method is illustrated on such data in 
the last section. The results we obtained are consistent with the literature but they also 
feature some new genes with a strong periodic component. This suggest that the approach 
described here is not only theoretically elegant but also efficient in practice. 

As a final remark, we would like to stress that the proposed method is fully compatible 
with all the features of Gaussian processes, from the combination of one-dimensional peri- 
odic kernels to obtain periodic kernels in higher dimension to the use of global optimisation 
routines such as EGO. 
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The following datasets are made available under the Public Domain Dedication and Li- 
cense vl.O whose full text can be found at: http : //www . opendatacommons . org/licenses/ 
pddl. 

Case study dataset: Original dataset with the gene expressions for each gene at each 
time point, (csv file) 

Case study results: File regrouping the available results from Edwards et al. [2006] and 
the one obtained in the application section. For both methods, the file gives the value 
of the criterion and the estimated period, (csv file) 



A.l Autoregressive processes and RKHS norms 

A process is said to be autoregressive (AR) if the spectral density of the kernel 
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can be written as a function of the form 

S{u>) = 



(31) 



where the polynomial YlT=o a k xk is re& l with no zeros in the right half of the complex 
plan Doob [1953]. Hereafter we assume that m > 1 and that oto, a m ^ 0. 

For such kernels, the inner product of the associated RKHS H is given by Hajek [1962], 
Kailath [1971], Parzen [1961] 

{h,g) n = [\L t h)(L t g)dt + 2 £ d j>k h^(a)g^(a) (32) 

^ a 0<j,k<m-l 

jr'+fc even 

m min(j,fe) 

where L t h = ^a fe /i (/c) (t) and d j>k = ^ (-lY J ~ l) aia j+k+1 -i. 

k=0 i=max(Oj+fc+l— n) 

We show in the next section that the Matern kernels correspond to autoregressive kernels 
and, for the usual values of z/, we derive the norm of the associated RKHS. 



A. 2 Application to Matern kernels 

Following the pattern exposed in Doob [1953, p. 542], the spectral density of a Matern 
kernel (Eq. 11) can be written as the density of an AR process when v + 1/2 is an integer. 
Indeed, the roots of the polynomial || + u 2 are conjugate pairs so it can be expressed as 
the squared module of a complex number 



2u 



+w )("-— ) 



CO + 
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(33) 



Multiplying by i and taking the conjugate of the quantity inside the module, we finally 
obtain a polynomial in ioo with all roots in the left half of the complex plan: 
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Plugging this expression into Eq. 11, we obtain the desired expression of S u : 
S u (u) = 



T(v)6 2 » 
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Using T(z/) 



one can derive the following expression of the coefficients a^- 



2 2^-i(V_i/2)!> 




2 (z/-l/2)! 2 2- u+1/2 \^ 



(2v-i)\w nk ( e 



fc-1/2 



(36) 



Theses values of can be plugged into Eq. 32 to obtain the expression of the RKHS 
inner product. The results for v G {1/2, 3/2, 5/2} is given by Eqs. 13-15 in the main body 
of the article. 
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